QTM 350 - Data Science Computing

Lecture 19 - Tables and Dates in SQL

Danilo Freire

Emory University

Hello again, my friends! 😊

Brief recap 📚

Recap of last class and today’s plan

Last time we learned how to:

  • Connect SQL with Python with sqlite3 and pandas
  • Use many SQL commands, such as CASE WHEN, window functions, and string functions
  • Fill missing data with COALESCE and CASE WHEN
  • Use pandas to write and run SQL queries
  • Pivot tables in SQLite

Today we will learn how to:

  • See different types of join in SQL
  • Use special joins, such as CROSS JOIN and SELF JOIN
  • Merge tables by row with UNION, INTERSECT, and EXCEPT
  • Use join conditions with ON, USING, and NATURAL JOIN
  • Solve exercises to practice what we learned
  • Let’s get started! 🚀

An announcement 📢

Final project instructions

  • The instructions for the final project is now available on GitHub
  • Please find it here: https://github.com/danilofreire/qtm350/blob/main/project/project-instructions.pdf
  • The project is due on 28th April, 2025
  • Groups of 3-4 students
  • You will create a GitHub repository with a report based on World Bank data
  • The report should be in Quarto, the data cleaning and descriptive statistics in SQL, and the data analysis and visualisation in Python
  • QTM 531 students will have a different project
  • Thanks to those who have already sent me their groups!
  • Please let me know if you have any questions!

Basic joins 📊

Primary and foreign keys

  • As with many languages, you can merge two tables in SQL either by columns or by rows
  • The most common method is the JOIN clause
  • JOIN is used to combine rows and columns from two or more tables based on a related column between them
  • As you know, there are two types of keys, primary and foreign keys
  • The primary key is a column that uniquely identifies each row in a table
  • A foreign key is a column that identifies a column in another table
    • One table can have multiple foreign keys, and they can be NULL
    • SQLite supports foreign keys, and the Python sqlite3 module handles them correctly by default

Load the libraries and connect to the database

  • Let’s load the libraries and connect to the SQLite database. We’ll use a file named lecture19.db.
import pandas as pd; import sqlite3

# Connect to the SQLite database (this will create the file if it doesn't exist)
connection = sqlite3.connect('lecture19.db'); cursor = connection.cursor()

cursor.execute('DROP TABLE IF EXISTS players;')
cursor.execute('''
CREATE TABLE players (
    player_id INTEGER PRIMARY KEY AUTOINCREMENT, player_name TEXT NOT NULL,
    goals INT NOT NULL, victories INT NOT NULL
);
''')

cursor.execute('DROP TABLE IF EXISTS teams;')
cursor.execute('''
CREATE TABLE teams (
    team_id INTEGER PRIMARY KEY AUTOINCREMENT, team_name TEXT NOT NULL
);
''')
connection.commit() # Commit changes

Create the tables

  • Now let’s insert some data into the tables!
# Insert data into the tables
cursor.execute('''
INSERT INTO players (player_name, goals, victories) VALUES
('Messi', 10, 5),
('Vini Jr', 8, 4),
('Neymar', 6, 3),
('Mbappe', 5, 2),
('Lewandowski', 4, 1),
('Haaland', 5, 3);
''')

cursor.execute('''
INSERT INTO teams (team_name) VALUES
('Inter Miami'),
('Real Madrid'),
('Al Hilal'),
('Real Madrid'),
('Bayern');
''')
connection.commit() # Commit changes

Visualise the tables

  • Let’s see our tables using pandas.
  • read_sql works fine with the sqlite3 connection object.
pd.read_sql('SELECT * FROM players', connection)
player_id player_name goals victories
0 1 Messi 10 5
1 2 Vini Jr 8 4
2 3 Neymar 6 3
3 4 Mbappe 5 2
4 5 Lewandowski 4 1
5 6 Haaland 5 3
pd.read_sql('SELECT * FROM teams', connection)
team_id team_name
0 1 Inter Miami
1 2 Real Madrid
2 3 Al Hilal
3 4 Real Madrid
4 5 Bayern

Types of joins

Inner join

  • The INNER JOIN returns only the records where there is a match between both tables (intersection) based on the join condition.
  • If there’s no match for a record in either table, that record will be excluded from the results.
  • Each match creates a new row in the result set that combines columns from both tables.
  • The matching condition is specified in the ON clause (e.g., ON table1.id = table2.id).
  • It is commonly used to combine related data, like the ones we have in the players and teams tables.
  • The syntax is standard SQL:
  • SELECT columns FROM table1 INNER JOIN table2 ON table1.column = table2.column
  • Note that Haaland (player_id 6) is not in the teams table (which only has 5 rows), so he will not appear in the result set.
pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals, players.victories
FROM players
INNER JOIN teams
ON players.player_id = teams.team_id;
''', connection)
player_name team_name goals victories
0 Messi Inter Miami 10 5
1 Vini Jr Real Madrid 8 4
2 Neymar Al Hilal 6 3
3 Mbappe Real Madrid 5 2
4 Lewandowski Bayern 4 1

Left join

  • The LEFT JOIN returns all records from the left table (first table) and the matched records from the right table (second table).
  • The result is NULL for columns from the right side if there is no match.
  • This is perhaps the most common type of join, as it keeps all the data from the table we are usually primarily interested in (the “left” table).
  • The syntax is as follows (pretty much the same as INNER JOIN):
  • SELECT columns FROM table1 LEFT JOIN table2 ON table1.column = table2.column
  • Note that Haaland is included here because he is in the players table (the left table), even though he has no matching team_id in the teams table. His team_name will be NULL.
pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM players
LEFT JOIN teams
ON players.player_id = teams.team_id;
''', connection)
player_name team_name goals
0 Messi Inter Miami 10
1 Vini Jr Real Madrid 8
2 Neymar Al Hilal 6
3 Mbappe Real Madrid 5
4 Lewandowski Bayern 4
5 Haaland None 5

Right join

  • The RIGHT JOIN returns all records from the right table (second table) and the matched records from the left table (first table).
  • The result is NULL for columns from the left side if there is no match.
  • As you have probably guessed, this is the opposite of the LEFT JOIN (and less common).
  • SELECT columns FROM table1 RIGHT JOIN table2 ON table1.column = table2.column
  • In our case, since teams has fewer rows than players and all team_ids match a player_id, the RIGHT JOIN looks the same as the INNER JOIN. If teams had an entry with a team_id not present in players, that row would appear with NULLs for player columns.
pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM players
RIGHT JOIN teams
ON players.player_id = teams.team_id;
''', connection)
player_name team_name goals
0 Messi Inter Miami 10
1 Vini Jr Real Madrid 8
2 Neymar Al Hilal 6
3 Mbappe Real Madrid 5
4 Lewandowski Bayern 4

Full outer join

  • The FULL OUTER JOIN returns all records when there is a match in either the left (first) or right (second) table.
  • It returns NULL values for columns from the table where there is no match.
  • In my experience, this is the least common type of join.
  • Why? Because it returns all the data from both tables, which can be large and potentially less focused. It can also be more computationally intensive.
  • The syntax is: SELECT columns FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column
pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM players
FULL OUTER JOIN teams
ON players.player_id = teams.team_id;
''', connection)
player_name team_name goals
0 Messi Inter Miami 10
1 Vini Jr Real Madrid 8
2 Neymar Al Hilal 6
3 Mbappe Real Madrid 5
4 Lewandowski Bayern 4
5 Haaland None 5

Try it yourself! 🧠

  • Let’s create two new tables (products, reviews) and insert some data into them. We use REAL for the price in SQLite.
# Create the tables and insert data
cursor.execute('DROP TABLE IF EXISTS reviews;') 
cursor.execute('DROP TABLE IF EXISTS products;')
cursor.execute('''
CREATE TABLE products (
    product_id INTEGER PRIMARY KEY AUTOINCREMENT,
    product_name TEXT NOT NULL,
    price REAL 
);
''')

# Insert products
cursor.execute('''
INSERT INTO products (product_name, price) VALUES
    ('Coffee Maker', 99.99),
    ('Toaster', 29.99),
    ('Blender', 79.99),
    ('Microwave', 149.99),
    ('Air Fryer', 89.99);
''')

cursor.execute('''
CREATE TABLE reviews (
    review_id INTEGER PRIMARY KEY AUTOINCREMENT,
    product_id INT,
    rating INT CHECK (rating BETWEEN 1 AND 5),
    comment TEXT,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
''')

# Insert reviews
cursor.execute('''
INSERT INTO reviews (product_id, rating, comment) VALUES
    (1, 5, 'Great coffee maker!'),
    (1, 4, 'Good but expensive'),
    (2, 3, 'Average toaster'),
    (3, 5, 'Best blender ever');
''')
connection.commit()
print("Tables 'products' and 'reviews' created and populated.")
Tables 'products' and 'reviews' created and populated.

Try it yourself! 🧠

  • Now try to merge the products and reviews tables using INNER JOIN and LEFT JOIN.
  • Explain the differences between the two results based on which products appear.
  • Appendix 01

Special joins 🌟

Cross join

  • CROSS JOIN is available in SQL, including SQLite.
  • A cross join does not use any comparisons (like ON) to match rows.
  • Instead, the result is constructed by pairing every row from the first table with every row from the second table (Cartesian product).
  • Useful for generating all possible combinations (e.g., pairing all sizes with all colors).
  • Can be resource-intensive with large tables since the result set size is rows_table1 * rows_table2.
  • Sometimes written using a comma between tables in the FROM clause (older syntax): SELECT * FROM table1, table2. Using CROSS JOIN is more explicit and recommended.
# Displaying cross join between players and teams
pd.read_sql('''
SELECT players.player_name, teams.team_name
FROM players
CROSS JOIN teams
ORDER BY players.player_id, teams.team_id; -- Added team_id to ordering for consistency
''', connection)
player_name team_name
0 Messi Inter Miami
1 Messi Real Madrid
2 Messi Al Hilal
3 Messi Real Madrid
4 Messi Bayern
5 Vini Jr Inter Miami
6 Vini Jr Real Madrid
7 Vini Jr Al Hilal
8 Vini Jr Real Madrid
9 Vini Jr Bayern
10 Neymar Inter Miami
11 Neymar Real Madrid
12 Neymar Al Hilal
13 Neymar Real Madrid
14 Neymar Bayern
15 Mbappe Inter Miami
16 Mbappe Real Madrid
17 Mbappe Al Hilal
18 Mbappe Real Madrid
19 Mbappe Bayern
20 Lewandowski Inter Miami
21 Lewandowski Real Madrid
22 Lewandowski Al Hilal
23 Lewandowski Real Madrid
24 Lewandowski Bayern
25 Haaland Inter Miami
26 Haaland Real Madrid
27 Haaland Al Hilal
28 Haaland Real Madrid
29 Haaland Bayern

Cross join

  • Here’s another example generating T-shirt combinations. SQLite uses || for string concatenation, not CONCAT().
# Drop and recreate tables
cursor.execute('DROP TABLE IF EXISTS colors;')
cursor.execute('DROP TABLE IF EXISTS sizes;')
cursor.execute('CREATE TABLE colors (color_name TEXT);')
cursor.execute('CREATE TABLE sizes (size_code TEXT);')
cursor.execute("INSERT INTO colors VALUES ('Black'), ('Red');")
cursor.execute("INSERT INTO sizes VALUES ('S'), ('M');")
connection.commit()

# Perform cross join and concatenate strings using ||
pd.read_sql('''
SELECT
    colors.color_name,
    sizes.size_code,
    colors.color_name || ' - ' || sizes.size_code as t_shirt -- Use || for concatenation
FROM colors
CROSS JOIN sizes
ORDER BY colors.color_name, sizes.size_code DESC;
''', connection)
color_name size_code t_shirt
0 Black S Black - S
1 Black M Black - M
2 Red S Red - S
3 Red M Red - M

Self join

  • A self join is a regular join, but the table is joined with itself (!) 🤯

  • It may not be immediately apparent how this could be useful, but it actually has many applications.

  • Often, tables describe entities that can have relationships with other entities of the same type within that table.

  • For instance, if you have a table of employees, each row could contain a manager_id column that references the employee_id of another employee in the same table.

  • A self join allows you to connect these related rows, for example, to list each employee alongside their manager’s name.

  • Since self joins reference the same table twice, table aliases are required to distinguish between the two instances of the table.

  • You could join the employees table like this: employees AS e JOIN employees AS m ON e.manager_id = m.employee_id.

  • This way, you can clearly specify which instance (e for employee, m for manager) you are referring to in the SELECT list and the ON condition.

  • The general syntax is: SELECT columns FROM table1 AS alias1 JOIN table1 AS alias2 ON alias1.column = alias2.column

Self join

  • Let’s see an example with a family table where mother_id refers back to person_id.
cursor.execute('DROP TABLE IF EXISTS family;')
cursor.execute('''
CREATE TABLE family (
    person_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    mother_id INT -- Refers to family.person_id
);
''')

cursor.execute('''
INSERT INTO family (name, mother_id) VALUES
    ('Emma', NULL), -- grandmother (id 1)
    ('Sarah', 1),   -- Emma's daughter (id 2)
    ('Lisa', 1),    -- Emma's daughter (id 3)
    ('Tom', 2),     -- Sarah's son (id 4)
    ('Alice', 2);   -- Sarah's daughter (id 5)
''')
connection.commit()

# Self join to find child-mother pairs
pd.read_sql('''
SELECT
    children.name as child,
    mothers.name as mother
FROM family AS children
JOIN family AS mothers ON children.mother_id = mothers.person_id
ORDER BY mothers.name, children.name; -- Added child name to ordering
''', connection)
child mother
0 Lisa Emma
1 Sarah Emma
2 Alice Sarah
3 Tom Sarah

Self join

  • Let’s see another example using the players table.
  • Here we want to compare the goals of every player against every other player. We use p1.player_id < p2.player_id to avoid duplicate pairs (e.g., Messi vs Vini Jr. and Vini Jr. vs Messi) and comparing a player to themselves.
pd.read_sql('''
SELECT
    p1.player_name,
    p1.goals,
    p2.player_name as compared_to,
    p2.goals as their_goals,
    p1.goals - p2.goals as difference
FROM players AS p1
JOIN players AS p2
ON p1.player_id < p2.player_id -- Avoid duplicates and self-comparison
ORDER BY difference DESC;
''', connection)
player_name goals compared_to their_goals difference
0 Messi 10 Lewandowski 4 6
1 Messi 10 Mbappe 5 5
2 Messi 10 Haaland 5 5
3 Messi 10 Neymar 6 4
4 Vini Jr 8 Lewandowski 4 4
5 Vini Jr 8 Mbappe 5 3
6 Vini Jr 8 Haaland 5 3
7 Messi 10 Vini Jr 8 2
8 Vini Jr 8 Neymar 6 2
9 Neymar 6 Lewandowski 4 2
10 Neymar 6 Mbappe 5 1
11 Neymar 6 Haaland 5 1
12 Mbappe 5 Lewandowski 4 1
13 Mbappe 5 Haaland 5 0
14 Lewandowski 4 Haaland 5 -1

Join conditions 🧩

Join conditions: ON vs NATURAL JOIN

  • The most standard way of defining the conditions for table joins is with the ON clause.
  • ON uses an expression (usually equality col1 = col2) to specify exactly how rows from the two tables should be matched.
  • SQLite uses these conditions to stitch together the rows from each table.
  • ON is explicit and very flexible, allowing complex conditions (e.g., multiple AND conditions, non-equality checks).
# Example using ON (same as Inner Join example)
pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM players
JOIN teams
ON players.player_id = teams.team_id;
''', connection)
player_name team_name goals
0 Messi Inter Miami 10
1 Vini Jr Real Madrid 8
2 Neymar Al Hilal 6
3 Mbappe Real Madrid 5
4 Lewandowski Bayern 4

Natural join

  • This is a feature available in SQL, including SQLite.
  • A NATURAL JOIN does not specify any join columns using ON or USING.
  • Instead, SQLite automatically joins the tables based on all columns that have the same name in both tables.
  • It’s very concise but can be risky if tables unexpectedly share column names that aren’t intended for joining. Use with caution!
# Using the 'teams' table recreated above which has 'player_id'
# NATURAL JOIN will automatically use 'player_id' because it's the only common column name
pd.read_sql('''
SELECT player_name, team_name, goals
FROM players
NATURAL JOIN teams; -- Automatically joins on player_id
''', connection)
player_name team_name goals
0 Messi Inter Miami 10
1 Messi Real Madrid 10
2 Messi Al Hilal 10
3 Messi Real Madrid 10
4 Messi Bayern 10
5 Vini Jr Inter Miami 8
6 Vini Jr Real Madrid 8
7 Vini Jr Al Hilal 8
8 Vini Jr Real Madrid 8
9 Vini Jr Bayern 8
10 Neymar Inter Miami 6
11 Neymar Real Madrid 6
12 Neymar Al Hilal 6
13 Neymar Real Madrid 6
14 Neymar Bayern 6
15 Mbappe Inter Miami 5
16 Mbappe Real Madrid 5
17 Mbappe Al Hilal 5
18 Mbappe Real Madrid 5
19 Mbappe Bayern 5
20 Lewandowski Inter Miami 4
21 Lewandowski Real Madrid 4
22 Lewandowski Al Hilal 4
23 Lewandowski Real Madrid 4
24 Lewandowski Bayern 4
25 Haaland Inter Miami 5
26 Haaland Real Madrid 5
27 Haaland Al Hilal 5
28 Haaland Real Madrid 5
29 Haaland Bayern 5

Merge tables by row 🧩

Union

  • The UNION operator combines the result sets of two or more SELECT statements vertically (stacking rows).
  • It automatically removes duplicate rows between the combined results.
  • The columns in each SELECT statement must be compatible: same number of columns, and corresponding columns must have compatible data types.
# Select player names and team names as a single list of names
# Use NULL for the second column in the second SELECT to match the structure
pd.read_sql('''
SELECT player_name, goals FROM players
UNION
SELECT team_name, NULL FROM teams -- Match column structure
ORDER BY player_name; -- Order the combined result
''', connection)
player_name goals
0 Al Hilal NaN
1 Bayern NaN
2 Haaland 5.0
3 Inter Miami NaN
4 Lewandowski 4.0
5 Mbappe 5.0
6 Messi 10.0
7 Neymar 6.0
8 Real Madrid NaN
9 Vini Jr 8.0

Union all and intersect

  • Similar to UNION, UNION ALL also merges tables by rows (stacks results vertically).
  • Unlike UNION, UNION ALL retains all duplicate rows. It simply appends the results. It’s generally faster than UNION as it doesn’t need to check for duplicates.
  • Let’s add a duplicate name to demonstrate.
# Temporarily insert a player with a name that matches a team name
cursor.execute("INSERT INTO players (player_name, goals, victories) VALUES ('Real Madrid', 0, 0);")
connection.commit()

print(pd.read_sql('''
SELECT player_name FROM players
UNION ALL
SELECT team_name FROM teams;
''', connection))

# Clean up the temporary insert
cursor.execute("DELETE FROM players WHERE player_name = 'Real Madrid' AND goals = 0;")
connection.commit()
    player_name
0         Messi
1       Vini Jr
2        Neymar
3        Mbappe
4   Lewandowski
5       Haaland
6   Real Madrid
7   Inter Miami
8   Real Madrid
9      Al Hilal
10  Real Madrid
11       Bayern

Union all and intersect

  • The INTERSECT operator returns only the rows that are common to the result sets of both SELECT statements. It finds the intersection.
  • Like UNION, it removes duplicates within the final result.
# Temporarily insert a player with a name that matches a team name
cursor.execute("INSERT INTO players (player_name, goals, victories) VALUES ('Real Madrid', 0, 0);")
connection.commit()

print("INTERSECT (common rows only):")
print(pd.read_sql('''
SELECT player_name FROM players
INTERSECT
SELECT team_name FROM teams;
''', connection))

# Clean up the temporary insert
cursor.execute("DELETE FROM players WHERE player_name = 'Real Madrid' AND goals = 0;")
connection.commit()
INTERSECT (common rows only):
   player_name
0  Real Madrid

Except

  • EXCEPT returns the rows from the first SELECT statement’s result set that are not present in the second SELECT statement’s result set. It finds the difference.
  • Like UNION, it removes duplicates before returning the final result.
# Temporarily insert a player with a name that matches a team name
try:
    cursor.execute("INSERT INTO players (player_name, goals, victories) VALUES ('Real Madrid', 0, 0);")
    connection.commit()
except sqlite3.IntegrityError:
     print("Duplicate player name might already exist.")

print("EXCEPT (rows in first query but not in second):")
print(pd.read_sql('''
SELECT player_name FROM players
EXCEPT
SELECT team_name FROM teams;
''', connection))

# Clean up the temporary insert
cursor.execute("DELETE FROM players WHERE player_name = 'Real Madrid' AND goals = 0;")
connection.commit()
EXCEPT (rows in first query but not in second):
   player_name
0      Haaland
1  Lewandowski
2       Mbappe
3        Messi
4       Neymar
5      Vini Jr

Merge operator (SQLite Alternative)

Merge UPSERT (INSERT ... ON CONFLICT)

  • The MERGE statement, common in other databases like PostgreSQL and SQL Server, is not supported by SQLite.
  • SQLite provides a powerful alternative for “UPSERT” (Update or Insert) operations using the INSERT ... ON CONFLICT clause (since version 3.24.0, Oct 2018).
  • This allows you to attempt an INSERT, and if it violates a constraint (like UNIQUE or PRIMARY KEY), you can specify an alternative action, typically an UPDATE.
  • Let’s see a simplified example: If we try to insert a player with an existing player_name, we update their goals instead.
# Add a UNIQUE constraint to player_name for the demo
# This requires recreating the table in SQLite
cursor.execute('DROP TABLE IF EXISTS players;')
cursor.execute('''
CREATE TABLE players (
    player_id INTEGER PRIMARY KEY AUTOINCREMENT,
    player_name TEXT NOT NULL UNIQUE,
    goals INT NOT NULL,
    victories INT NOT NULL
);
''')
# Re-insert initial data
cursor.execute('''
INSERT INTO players (player_name, goals, victories) VALUES
('Messi', 10, 5), ('Vini Jr', 8, 4), ('Neymar', 6, 3),
('Mbappe', 5, 2), ('Lewandowski', 4, 1), ('Haaland', 5, 3);
''')
connection.commit()

Conclusion 📖

Conclusion

  • Today we learned about different types of joins in SQL (INNER, LEFT, RIGHT, FULL OUTER), noting potential version requirements in SQLite for RIGHT and FULL OUTER.
  • We also learned about special joins: CROSS JOIN for combinations and SELF JOIN for relating rows within the same table.
  • We saw how to merge tables vertically (by row) with UNION, UNION ALL, INTERSECT, and EXCEPT.
  • We also learned about different ways to specify join conditions: ON, USING, and NATURAL JOIN.
  • And we saw the SQLite alternative to MERGE, the INSERT ... ON CONFLICT (UPSERT) clause 🚀.

And that’s all for today! 🎉

Thank you and have a great rest of your day! 🙏

Appendix 01

  • Here is the solution to the exercise comparing INNER JOIN and LEFT JOIN for products and reviews.
print("INNER JOIN Results (Only products with reviews):")
# INNER JOIN only includes products that have at least one review.
# Products like 'Microwave' and 'Air Fryer' are excluded.
print(pd.read_sql('''
    SELECT p.product_name, r.rating, r.comment
    FROM products p
    INNER JOIN reviews r ON p.product_id = r.product_id
    ORDER BY p.product_id, r.review_id; -- Added review_id for consistent ordering
''', connection))
INNER JOIN Results (Only products with reviews):
   product_name  rating              comment
0  Coffee Maker       5  Great coffee maker!
1  Coffee Maker       4   Good but expensive
2       Toaster       3      Average toaster
3       Blender       5    Best blender ever
print("\nLEFT JOIN Results (All products, reviews where available):")
print(pd.read_sql('''
    SELECT p.product_name, r.rating, r.comment
    FROM products p
    LEFT JOIN reviews r ON p.product_id = r.product_id
    ORDER BY p.product_id, r.review_id;
''', connection))

LEFT JOIN Results (All products, reviews where available):
   product_name  rating              comment
0  Coffee Maker     5.0  Great coffee maker!
1  Coffee Maker     4.0   Good but expensive
2       Toaster     3.0      Average toaster
3       Blender     5.0    Best blender ever
4     Microwave     NaN                 None
5     Air Fryer     NaN                 None

Back to exercise

Appendix 02

  • Here is the solution to the self-join exercise comparing player victories.
  • Note the use of CAST(... AS REAL) or multiplying by 1.0 to ensure floating-point division in SQLite.
print("Self Join on Players to Compare Victories:")
print(pd.read_sql('''
SELECT
    p1.player_name,
    p1.victories,
    p2.player_name AS compared_to,
    p2.victories AS their_victories,
    -- Ensure floating point division by casting one operand to REAL or NUMERIC
    ROUND(CAST(p1.victories AS REAL) / p2.victories, 2) AS victories_ratio
FROM players p1
JOIN players p2
    ON p1.player_id < p2.player_id -- Avoid duplicates and self-comparison
WHERE
    p2.victories > 0 -- Avoid division by zero
ORDER BY
    p1.player_id, p2.player_id; -- Consistent ordering
''', connection))
Self Join on Players to Compare Victories:
    player_name  victories  compared_to  their_victories  victories_ratio
0         Messi          5      Vini Jr                4             1.25
1         Messi          5       Neymar                3             1.67
2         Messi          5       Mbappe                2             2.50
3         Messi          5  Lewandowski                1             5.00
4         Messi          5      Haaland                3             1.67
5       Vini Jr          4       Neymar                3             1.33
6       Vini Jr          4       Mbappe                2             2.00
7       Vini Jr          4  Lewandowski                1             4.00
8       Vini Jr          4      Haaland                3             1.33
9        Neymar          3       Mbappe                2             1.50
10       Neymar          3  Lewandowski                1             3.00
11       Neymar          3      Haaland                3             1.00
12       Mbappe          2  Lewandowski                1             2.00
13       Mbappe          2      Haaland                3             0.67
14  Lewandowski          1      Haaland                3             0.33

Back to exercise

Appendix 03

Cleaning the SQLite database

  • Unlike server-based databases like PostgreSQL, SQLite databases are typically single files.

  • “Cleaning” the database often means simply deleting the database file and starting fresh.

  • If you encounter issues, the easiest way to reset is to close any active connections to the database file and then remove the file using your operating system or Python’s os module.

  • Here’s how you can close the connection and delete the database file (lecture19.db) we used in this lecture using Python:

# Ensure the connection is closed first
try:
    connection.close()
    print("SQLite connection closed.")
except Exception as e:
    print(f"Error closing connection (might be already closed): {e}")
SQLite connection closed.

Back to the lecture